200 research outputs found
Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation
This paper demonstrates that word sense disambiguation (WSD) can improve
neural machine translation (NMT) by widening the source context considered when
modeling the senses of potentially ambiguous words. We first introduce three
adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant
processes, and random walks, which are then applied to large word contexts
represented in a low-rank space and evaluated on SemEval shared-task data. We
then learn word vectors jointly with sense vectors defined by our best WSD
method, within a state-of-the-art NMT system. We show that the concatenation of
these vectors, and the use of a sense selection mechanism based on the weighted
average of sense vectors, outperforms several baselines including sense-aware
ones. This is demonstrated by translation on five language pairs. The
improvements are above one BLEU point over strong NMT baselines, +4% accuracy
over all ambiguous nouns and verbs, or +20% when scored manually over several
challenging words.Comment: To appear in TAC
Reference-based vs. task-based evaluation of human language technology
This paper starts from the ISO distinction of three types of evaluation procedures â internal, external and in use â and proposes to match these types to the three types of human language technology (HLT) systems: analysis, generation, and interactive. The paper explains why internal evaluation is not suitable to measure the qualities of HLT systems, and shows that reference-based external evaluation is best adapted to âanalysisâ systems, task-based evaluation to âinteractiveâ systems, while âgenerationâ systems can be subject to both types of evaluation. In particular, some limits of reference-based external evaluation are shown in the case of generation systems. Finally, the paper shows that contextual evaluation, as illustrated by the FEMTI framework for MT evaluation, is an effective method for getting reference-based evaluation closer to the users of a system
Comparing meeting browsers using a task-based evaluation method
Information access within meeting recordings, potentially transcribed and augmented with other media, is facilitated by the use of meeting browsers. To evaluate their performance through a shared benchmark task, users are asked to discriminate between true and false parallel statements about facts in meetings, using different browsers. This paper offers a review of the results obtained so far with five types of meeting browsers, using similar sets of statements over the same meeting recordings. The results indicate that state-of-the-art speed for true/false question answering is 1.5-2 minutes per question, and precision is 70%-80% (vs. 50% random guess). The use of ASR compared to manual transcripts, or the use of audio signals only, lead to a perceptible though not dramatic decrease in performance scores
Dimensionality of Dialogue Act Tagsets: An Empirical Analysis of Large Corpora
This article compares one-dimensional and multi-dimensional dialogue act tagsets used for automatic labeling of utterances. The influence of tagset dimensionality on tagging accuracy is first discussed theoretically, then based on empirical data from human and automatic annotations of large scale resources, using four existing tagsets: DAMSL, SWBD-DAMSL, ICSI-MRDA and MALTUS. The Dominant Function Approximation proposes that automatic dialogue act taggers could focus initially on finding the main dialogue function of each utterance, which is empirically acceptable and has significant practical relevance
Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German
The goal of this work is to design a machine translation (MT) system for a
low-resource family of dialects, collectively known as Swiss German, which are
widely spoken in Switzerland but seldom written. We collected a significant
number of parallel written resources to start with, up to a total of about 60k
words. Moreover, we identified several other promising data sources for Swiss
German. Then, we designed and compared three strategies for normalizing Swiss
German input in order to address the regional diversity. We found that
character-based neural MT was the best solution for text normalization. In
combination with phrase-based statistical MT, our solution reached 36% BLEU
score when translating from the Bernese dialect. This value, however, decreases
as the testing data becomes more remote from the training one, geographically
and topically. These resources and normalization techniques are a first step
towards full MT of Swiss German dialects.Comment: 11th Language Resources and Evaluation Conference (LREC), 7-12 May
2018, Miyazaki (Japan
- âŠ